Goto

Collaborating Authors

 claim follow





Offline_Distributional_RL__NeurIPS_2021_Submission_ (6)

Neural Information Processing Systems

We give a proof in Appendix A.5. As we discuss in Appendix A.6, we can use this result to obtain First, by Lemma 3.4, we have F Then, by Lemma A.1, with probability at least 1, we have F Note that to show the claim, it suffices to show that for sufficient large, we have ( / 2) c ( s) ( s)+ ( 8 s) . The claim follows by taking the limit k!1 . We first prove a bound on the concentration of the empirical CDF to the true CDF. We proceed by bounding the two terms in the summation.


Convergent Reinforcement Learning Algorithms for Stochastic Shortest Path Problem

Guin, Soumyajit, Bhatnagar, Shalabh

arXiv.org Artificial Intelligence

In this paper we propose two algorithms in the tabular setting and an algorithm for the function approximation setting for the Stochastic Shortest Path (SSP) problem. SSP problems form an important class of problems in Reinforcement Learning (RL), as other types of cost-criteria in RL can be formulated in the setting of SSP. We show asymptotic almost-sure convergence for all our algorithms. We observe superior performance of our tabular algorithms compared to other well-known convergent RL algorithms. We further observe reliable performance of our function approximation algorithm compared to other algorithms in the function approximation setting.


A Proof of Theorem 3.1

Neural Information Processing Systems

In this section, we prove Theorem 3.1, which says that it suffices to the augmented state space First, we have the following lemma. Lemma A.2. W e have D Now, we prove Theorem 3.1. By Lemma A.2, we have F Theorem 3.1 follows straightforwardly from this result. Consider the same setup as in Lemma B.1. Lemma B.1, we have F Consider the same setup as in Lemma B.1.


Supplementary Materials Roadmap

Neural Information Processing Systems

In this supplementary material, we provide "full versions" of Sections 2-4 from the main submission, Fact 2.5 (Uniform bound on entries of Gaussian vector) . For g N (0, Id), g/ nullg null is identical in distribution to v . We can expand the expectation and apply Fact B.2 to get We will also need the following stability result for affine linear thresholds. Putting all of these ingredients together, we can now complete the proof of the main Lemma B.1 of By Lemma B.6 applied to the projection of f to the two-dimensional This notion is motivated by Lemma 4.4 in Section C.1 where we study the critical points We first collect some elementary consequences of closeness. Suppose < 2/ 2. If ( v In the rest of the paper we will take to be small, so Lemma 3.3 will always apply.


af5baf594e9197b43c9f26f17b205e5b-Supplemental.pdf

Neural Information Processing Systems

Supplementary Material (Appendix) When Are Solutions Connected in Deep Networks? Hence, (15) holds and the desired claim follows. Thus, by using again assumption (A1), we can apply Corollary A.1 of [ Thus, the desired claim follows from Theorem 4.1. Note that we apply Corollary A.1 of [ Thus, the 2nd condition follows from assumption (A1), and the application of Corollary A.1 is justified. Let us assume w.l.o.g. that This shows that the set of features formed by these neurons is linearly separable.


Offline_Distributional_RL__NeurIPS_2021_Submission_ (6)

Neural Information Processing Systems

We give a proof in Appendix A.5. As we discuss in Appendix A.6, we can use this result to obtain First, by Lemma 3.4, we have F Then, by Lemma A.1, with probability at least 1, we have F Note that to show the claim, it suffices to show that for sufficient large, we have ( / 2) c ( s) ( s)+ ( 8 s) . The claim follows by taking the limit k!1 . We first prove a bound on the concentration of the empirical CDF to the true CDF. We proceed by bounding the two terms in the summation.


Refined Risk Bounds for Unbounded Losses via Transductive Priors

Qian, Jian, Rakhlin, Alexander, Zhivotovskiy, Nikita

arXiv.org Machine Learning

We revisit the sequential variants of linear regression with the squared loss, classification problems with hinge loss, and logistic regression, all characterized by unbounded losses in the setup where no assumptions are made on the magnitude of design vectors and the norm of the optimal vector of parameters. The key distinction from existing results lies in our assumption that the set of design vectors is known in advance (though their order is not), a setup sometimes referred to as transductive online learning. While this assumption seems similar to fixed design regression or denoising, we demonstrate that the sequential nature of our algorithms allows us to convert our bounds into statistical ones with random design without making any additional assumptions about the distribution of the design vectors--an impossibility for standard denoising results. Our key tools are based on the exponential weights algorithm with carefully chosen transductive (design-dependent) priors, which exploit the full horizon of the design vectors. Our classification regret bounds have a feature that is only attributed to bounded losses in the literature: they depend solely on the dimension of the parameter space and on the number of rounds, independent of the design vectors or the norm of the optimal solution. For linear regression with squared loss, we further extend our analysis to the sparse case, providing sparsity regret bounds that additionally depend on the magnitude of the response variables. We argue that these improved bounds are specific to the transductive setting and unattainable in the worst-case sequential setup. Our algorithms, in several cases, have polynomial time approximations and reduce to sampling with respect to log-concave measures instead of aggregating over hard-to-construct $\varepsilon$-covers of classes.